376 results found.
Written
Corpus,
Language Type:
Bilingual
Languages:
Ch'ol Maya Mazatec Mixtec Otomi Spanish and Nahuatl
Availability:
Freely Available
License:
UNAM
Size:
850000 words Production Status:
Newly created-in progress
Use:
Corpus Creation/Annotation
-
Paper title:CPLM, a Parallel Corpus for Mexican Languages: Development and Interface
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Gerardo Sierra Martínez | CPLM: Corpus Paralelo de Lenguas Mexicanas | /N |
Documentation:
Spanish
Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
Size:
30,000 OtherProduction Status:
Newly created-finished
Use:
Opinion Mining/Sentiment Analysis
-
Paper title:HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Luis Chiruzzo | Spanish Humor Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
Apache 2.0
Size:
916552 tokens Production Status:
Newly created-in progress
Use:
Acquisition
-
Paper title:Developing NLP Tools with a New Corpus of Learner Spanish
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Sam Davidson | Corpus Of Written Spanish - L2 and Heritage speakers (COWS-L2H) | /N |
Documentation:
Yes, in English
Written
Negation detector,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Not Available
License:
Size:
None OtherProduction Status:
Newly created-in progress
Use:
Negation processing
-
Paper title:Detecting Negation Cues and Scopes in Spanish
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Roser Morante | ES-NegDetect | /N |
Documentation:
None
Written
Lexicon,
Language Type:
Multilingual
Languages:
English French Italian Portuguese Romanian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
900 KByte Production Status:
Newly created-finished
Use:
Lexicon Creation/Annotation
-
Paper title:Automatically Building a Multilingual Lexicon of False Friends With No Supervision
-
Paper track:Terminology/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ana Sabina Uban | FalseFriendsLexicon | /N |
Documentation:
https://github.com/ananana/false_friends_resource/blob/master/README.md
Written
Treebank,
Language Type:
Monolingual
Languages:
Afrikaans Akkadian Amharic Ancient Greek Arabic Armenian Assyrian Bambara Basque Belarusian Bhojpuri Breton Bulgarian Buryat Cantonese Catalan Chinese Classical Chinese Coptic Croatian Czech Danish Dutch English Erzya Estonian Faroese Finnish French Galician German Gothic Greek Hebrew Hindi Hindi English Hungarian Indonesian Irish Italian Japanese Karelian Kazakh Komi Permyak Komi Zyrian Korean Kurmanji Latin Latvian Lithuanian Livvi Maltese Marathi Mbya Guarani Moksha Naija North Sami Norwegian Old Church Slavonic Old French Old Russian Persian Polish Portuguese Romanian Russian Sanskrit Scottish Gaelic Serbian Skolt Sami Slovak Slovenian Spanish Swedish Swedish Sign Language Swiss German Tagalog Tamil Telugu Thai Turkish Ukrainian Upper Sorbian Urdu Uyghur Vietnamese Warlpiri Welsh Wolof Yoruba
Availability:
Freely Available
License:
Various
Size:
25 million words Production Status:
Existing-updated
Use:
Parsing and Tagging
-
Paper title:Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joakim Nivre | Universal Dependencies | /N |
Documentation:
https://universaldependencies.org
Multimodal/Multimedia
Corpus,
Language Type:
Bilingual
Languages:
English Spanish
Availability:
Freely Available
License:
Open Data Commons Attribution License (ODC-BY) v1.0
Size:
5000000 words Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:EMPAC: an English–Spanish Corpus of Institutional Subtitles
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | José Manuel Martínez Martínez | EuroparlTV Multimedia Parallel Corpus (EMPAC) | /N |
Documentation:
There is documentation in English which will be released together with the corpus and its toolkit.
Written
Lexicon,
Language Type:
Multilingual
Languages:
Bulgarian Catalan Chinese Dutch English Estonian Finnish Italian Portuguese Slovenian Spanish Swedish Thai and Turkish
Availability:
Freely Available
License:
Open Source
Size:
41 411 senses for Bulgarian, 35 820 for Swedish OtherProduction Status:
Newly created-in progress
Use:
Word Sense Disambiguation
-
Paper title:A Parallel WordNet for English, Swedish and Bulgarian
-
Paper track:Written/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Krasimir Angelov | GF WordNet | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Chinese Czech English Finnish French German Hindi Indonesian Italian Japanese Korean Polish Portuguese Russian Spanish Swedish Thai Turkish
Availability:
Freely Available
License:
CC-BY-SA
Size:
300 KByte Production Status:
Newly created-finished
Use:
Emotion Recognition/Generation
-
Paper title:How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hiroshi Kanayama | Parallel Sentiment | /N |
Documentation:
For 19 languages (ar,cs,de,en,es,fi,fr,hi,id,it,ja,ko,pl,pt,ru,sv,th,tr,zh)
Written
Corpus,
Language Type:
Multilingual
Languages:
Catalan English Spanish
Availability:
Freely Available
License:
MIT License
Size:
2000 sentences Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Marta R. Costa-jussà | GeBioCorpus_v2 | /N |
Documentation:
None




